With the continuous development of artificial intelligence technology, the integration of visual and textual data has become a complex challenge. Traditional models often struggle to accurately parse structured visual documents such as tables, charts, infographics, and diagrams. This limitation impacts automated content extraction and comprehension capabilities, subsequently affecting applications in data analysis, information retrieval, and decision-making. In response to this demand, IBM recently released Granite-Vision-3.1-2B, a compact visual language model specifically designed for document understanding.